Search | VHL Regional Portal

Manipulating the Alpha Level Cannot Cure Significance Testing.

Trafimow, David; Amrhein, Valentin; Areshenkoff, Corson N; Barrera-Causil, Carlos J; Beh, Eric J; Bilgiç, Yusuf K; Bono, Roser; Bradley, Michael T; Briggs, William M; Cepeda-Freyre, Héctor A; Chaigneau, Sergio E; Ciocca, Daniel R; Correa, Juan C; Cousineau, Denis; de Boer, Michiel R; Dhar, Subhra S; Dolgov, Igor; Gómez-Benito, Juana; Grendar, Marian; Grice, James W; Guerrero-Gimenez, Martin E; Gutiérrez, Andrés; Huedo-Medina, Tania B; Jaffe, Klaus; Janyan, Armina; Karimnezhad, Ali; Korner-Nievergelt, Fränzi; Kosugi, Koji; Lachmair, Martin; Ledesma, Rubén D; Limongi, Roberto; Liuzza, Marco T; Lombardo, Rosaria; Marks, Michael J; Meinlschmidt, Gunther; Nalborczyk, Ladislas; Nguyen, Hung T; Ospina, Raydonal; Perezgonzalez, Jose D; Pfister, Roland; Rahona, Juan J; Rodríguez-Medina, David A; Romão, Xavier; Ruiz-Fernández, Susana; Suarez, Isabel; Tegethoff, Marion; Tejo, Mauricio; van de Schoot, Rens; Vankov, Ivan I; Velasco-Forero, Santiago.

Front Psychol ; 9: 699, 2018.

Article in English | MEDLINE | ID: mdl-29867666

ABSTRACT

We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of 0.05, 0.01, 0.005, or anything else, is not acceptable.

Out with .05, in with Replication and Measurement: Isolating and Working with the Particular Effect Sizes that are Troublesome for Inferential Statistics.

Bradley, Michael T; Brand, Andrew.

J Gen Psychol ; 144(4): 309-316, 2017.

Article in English | MEDLINE | ID: mdl-29023206

ABSTRACT

It is difficult to obtain adequate power to test a small effect size with a set criterion alpha of 0.05. Probably an inferential test will indicate non-statistical significance and not be published. Rarely, statistical significance will be obtained, and an exaggerated effect size calculated and reported. Accepting all inferential probabilities and associated effect sizes could solve exaggeration problems. Graphs, generated through Monte Carlo methods, are presented to illustrate this. The first graph presents effect sizes (Cohen's d) as lines from 1 to 0 with probabilities on the Y axis and the number of measures on the X axis. This graph shows effect sizes of .5 or less should yield non-significance with sample sizes below 120 measures. The other graphs show results with as many as 10 small sample size replications. There is a convergence of means with the effect size as sample size increases and measurement accuracy emerges.

Subject(s)

Data Interpretation, Statistical , Statistics as Topic , Humans , Probability , Sample Size

Significance Testing Needs a Taxonomy: Or How the Fisher, Neyman-Pearson Controversy Resulted in the Inferential Tail Wagging the Measurement Dog.

Bradley, Michael T; Brand, Andrew.

Psychol Rep ; 119(2): 487-504, 2016 Oct.

Article in English | MEDLINE | ID: mdl-27502529

ABSTRACT

Accurate measurement and a cutoff probability with inferential statistics are not wholly compatible. Fisher understood this when he developed the F test to deal with measurement variability and to make judgments on manipulations that may be worth further study. Neyman and Pearson focused on modeled distributions whose parameters were highly determined and concluded that inferential judgments following an F test could be made with accuracy because the distribution parameters were determined. Neyman and Pearson's approach in the application of statistical analyses using alpha and beta error rates has played a dominant role guiding inferential judgments, appropriately in highly determined situations and inappropriately in scientific exploration. Fisher tried to explain the different situations, but, in part due to some obscure wording, generated a long standing dispute that currently has left the importance of Fisher's p < .05 criteria not fully understood and a general endorsement of the Neyman and Pearson error rate approach. Problems were compounded with power calculations based on effect sizes following significant results entering into exploratory science. To understand in a practical sense when each approach should be used, a dimension reflecting varying levels of certainty or knowledge of population distributions is presented. The dimension provides a taxonomy of statistical situations and appropriate approaches by delineating four zones that represent how well the underlying population of interest is defined ranging from exploratory situations to highly determined populations.

Subject(s)

Data Interpretation, Statistical , Psychometrics/standards , Humans

Accuracy when inferential statistics are used as measurement tools.

Bradley, Michael T; Brand, Andrew.

BMC Res Notes ; 9: 241, 2016 Apr 26.

Article in English | MEDLINE | ID: mdl-27112752

ABSTRACT

BACKGROUND: Inferential statistical tests that approximate measurement are called acceptance procedures. The procedure includes type 1 error, falsely rejecting the null hypothesis, and type 2 error, failing to reject the null hypothesis when the alternative should be supported. This approach involves repeated sampling from a distribution with established parameters such that the probabilities of these errors can be ascertained. With low error probabilities the procedure has the potential to approximate measurement. How close this procedure approximates measurement was examined. FINDINGS: A Monte Carlo procedure set the type 1 error at p = 0.05 and the type 2 error at either p = 0.20 or p = 0.10 for effect size values of d = 0.2, 0.5, and 0.8. The resultant values are approximately 15 and 6.25% larger than the effect sizes entered into the analysis depending on a type 2 error rate of p < 0.20, or p < 0.10 respectively. CONCLUSIONS: Acceptance procedures approximate values wherein a decision could be made. In a health district a deviation at a particular level could signal a change in health. The approximations could be reasonable in some circumstances, but if more accurate measures are desired a deviation could be reduced by the percentage appropriate for the power. The tradeoff for such a procedure is an increase in type 1 error rate and a decrease in type 2 errors.

Subject(s)

Models, Statistical , Monte Carlo Method , Probability , Statistics as Topic/methods , Reproducibility of Results , Science/methods , Science/statistics & numerical data

The Precision of Effect Size Estimation From Published Psychological Research: Surveying Confidence Intervals.

Brand, Andrew; Bradley, Michael T.

Psychol Rep ; 118(1): 154-170, 2016 Feb.

Article in English | MEDLINE | ID: mdl-29693529

ABSTRACT

Confidence interval ( CI) widths were calculated for reported Cohen's d standardized effect sizes and examined in two automated surveys of published psychological literature. The first survey reviewed 1,902 articles from Psychological Science. The second survey reviewed a total of 5,169 articles from across the following four APA journals: Journal of Abnormal Psychology, Journal of Applied Psychology, Journal of Experimental Psychology: Human Perception and Performance, and Developmental Psychology. The median CI width for d was greater than 1 in both surveys. Hence, CI widths were, as Cohen (1994) speculated, embarrassingly large. Additional exploratory analyses revealed that CI widths varied across psychological research areas and that CI widths were not discernably decreasing over time. The theoretical implications of these findings are discussed along with ways of reducing the CI widths and thus improving precision of effect size estimation.

More Voodoo correlations: when average-based measures inflate correlations.

Brand, Andrew; Bradley, Michael T.

J Gen Psychol ; 139(4): 260-72, 2012.

Article in English | MEDLINE | ID: mdl-24837177

ABSTRACT

A Monte-Carlo simulation was conducted to assess the extent that a correlation estimate can be inflated when an average-based measure is used in a commonly employed correlational design. The results from the simulation reveal that the inflation of the correlation estimate can be substantial, up to 76%. Additionally, data was re-analyzed from two previously published studies to determine the extent that the correlation estimate was inflated due to the use of an averaged based measure. The re-analyses reveal that correlation estimates had been inflated by just over 50% in both studies. Although these findings are disconcerting, we are somewhat comforted by the fact that there is a simple and easy analysis that can be employed to prevent the inflation of the correlation estimate that we have simulated and observed.

Subject(s)

Data Interpretation, Statistical , Statistics as Topic , Humans , Monte Carlo Method , Psychology, Experimental/methods , Psychology, Experimental/standards , Statistics as Topic/methods , Statistics as Topic/standards

Multiple trials may yield exaggerated effect size estimates.

Brand, Andrew; Bradley, Michael T; Best, Lisa A; Stoica, George.

J Gen Psychol ; 138(1): 1-11, 2011.

Article in English | MEDLINE | ID: mdl-21404946

ABSTRACT

Published psychological research attempting to support the existence of small and medium effect sizes may not have enough participants to do so accurately, and thus, repeated trials or the use of multiple items may be used in an attempt to obtain significance. Through a series of Monte-Carlo simulations, this article describes the results of multiple trials or items on effect size estimates when the averages and aggregates of a dependent measure are analyzed. The simulations revealed a large increase in observed effect size estimates when the numbers of trials or items in an experiment were increased. Overestimation effects are mitigated by correlations between trials or items, but remain substantial in some cases. Some concepts, such as a P300 wave or a test score, are best defined as a composite of measures. Troubles may arise in more exploratory research where the interrelations among trials or items may not be well described.

Subject(s)

Data Collection/statistics & numerical data , Monte Carlo Method , Psychology/statistics & numerical data , Bias , Event-Related Potentials, P300/physiology , Humans , Psychological Tests/statistics & numerical data , Psychometrics/statistics & numerical data , Regression Analysis , Reproducibility of Results , Statistics as Topic

Accuracy of effect size estimates from published psychological research.

Brand, Andrew; Bradley, Michael T; Best, Lisa A; Stoica, George.

Percept Mot Skills ; 106(2): 645-9, 2008 Apr.

Article in English | MEDLINE | ID: mdl-18556917

ABSTRACT

A Monte-Carlo simulation was used to model the biasing of effect sizes in published studies. The findings from the simulation indicate that, when a predominant bias to publish studies with statistically significant results is coupled with inadequate statistical power, there will be an overestimation of effect sizes. The consequences such an effect size overestimation will then have on meta-analyses and power analyses are highlighted and discussed along with measures which can be taken to reduce the problem.

Subject(s)

Psychology/statistics & numerical data , Publishing/statistics & numerical data , Research/standards , Humans

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL